Evaluating comprehension of natural and synthetic conversational speech
نویسندگان
چکیده
Current speech synthesis methods typically operate on isolated sentences and lack convincing prosody when generating longer segments of speech. Similarly, prevailing TTS evaluation paradigms, such as intelligibility (transcription word error rate) or MOS, only score sentences in isolation, even though overall comprehension arguably is more important for speech-based communication. In an effort to develop more ecologicallyrelevant evaluation techniques that go beyond isolated sentences, we investigated comprehension of natural and synthetic speech dialogues. Specifically, we tested listener comprehension on long segments of spontaneous and engaging conversational speech (three 10-minute radio interviews of comedians). Interviews were reproduced either as natural speech, synthesised from carefully prepared transcripts, or synthesised using durations from forced-alignment against the natural speech, all in a balanced design. Comprehension was measured using multiple choice questions. A significant difference was measured between the comprehension/retention of natural speech (74% correct responses) and synthetic speech with forced-aligned durations (61% correct responses). However, no significant difference was observed between natural and regular synthetic speech (70% correct responses). Effective evaluation of comprehension remains elusive.
منابع مشابه
Synthesis and evaluation of conversational characteristics in speech synthesis
Conventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We need to develop voices that can partake in a conversation and express, e.g. agreement, disagreement, hesitation, in a natural and believable manner. In speech synthesis there ar...
متن کاملComprehension of KTH text-to-speech with "listening speed" paradigm
The comprehension of natural and synthetic speech in Swedish and American English was investigated using a sentence-by-sentence listening paradigm. The synthesized speech was generated by the KTH text-to-speech systems. Results indicated that sentence listening times were signzficantly longer only for American English synthetic speech as compared to natural speech. Text dijjjculty was found to ...
متن کاملComprehension of synthesized speech while driving and in the lab
Two studies were conducted to measure the comprehensibility of synthetic speech with current text-tospeech technology. Baseline measurements for each subject were obtained using recorded natural speech. The first study was conducted in a quiet lab with no distractions. Half the subjects were allowed to take notes while listening, the other half were not. Findings showed that there was no signif...
متن کاملPerception and Comprehension of Synthetic Speech
An extensive body of research on the perception of synthetic speech carried out over the past 30 years has established that listeners have much more difficulty perceiving synthetic speech than natural speech. Differences in perceptual processing have been found in a variety of behavioral tasks, including assessments of segmental intelligibility, word recall, lexical decision, sentence transcrip...
متن کاملComprehension of Speech Presented at Synthetically Accelerated Rates: Evaluating Training and Practice Effects
The ability to monitor multiple sources of concurrent auditory information is an integral component of Navy watchstanding operations. However, this leads to attentionally demanding environments. The present study tested the utility of a potential solution to listening to multiple speech communications in an auditory display environment: presenting speech serially at synthetically accelerated ra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016